Model Selection

Zero-shot Voice Cloning

# Zero-shot Voice Cloning

Spark-TTS is an efficient text-to-speech system based on large language models (LLM), supporting bilingual synthesis in Chinese and English with zero-shot voice cloning.

Speech Synthesis Supports Multiple Languages

Spark-TTS is an advanced text-to-speech system based on large language models, capable of high-precision and natural-sounding speech synthesis.

Speech Synthesis Supports Multiple Languages

A multilingual emotional and singing voice synthesis model fine-tuned based on Dia-1.6B, supporting voice cloning and emotion control

Speech Synthesis Supports Multiple Languages

A cutting-edge speech large model based on the Llama architecture, designed for high-quality, empathetic text-to-speech generation

Speech Synthesis

Transformers English

Openf5 TTS Base

OpenF5 TTS is an open-source text-to-speech model trained on the F5-TTS framework, supporting zero-shot voice cloning functionality, released under the Apache 2.0 license for commercial use.

Speech Synthesis English

Orpheus 3b 0.1 GGUF

A high-quality text-to-speech model based on Llama architecture, supporting emotion control and real-time streaming

Speech Synthesis Supports Multiple Languages

Orpheus Exl2 4bit

High-quality text-to-speech model based on Llama architecture, supporting emotion control and voice cloning

Speech Synthesis

Transformers English

YaTharThShaRma999

Orpheus 3b 0.1 Ft

High-quality text-to-speech model based on Llama architecture, supporting emotion control and voice cloning

Speech Synthesis

Transformers English

Orpheus 3b 0.1 Ft

A cutting-edge voice large model based on the Llama architecture, designed for high-quality, empathetic text-to-speech generation

Speech Synthesis

Transformers English

Zonos V0.1 Transformer

Zonos-v0.1 is a leading open-weight text-to-speech model trained on over 200,000 hours of multilingual speech data, delivering expressiveness and quality comparable to or even surpassing top-tier TTS service providers.

Speech Synthesis

Cosyvoice2 0.5B

CosyVoice is a text-to-speech (TTS) model that supports multilingual and voice conversion capabilities, providing high-quality speech synthesis.

Speech Synthesis

GPT SoVITS V1 Base

GPT-SoVITS (V1) is a multilingual text-to-speech foundation model supporting Chinese, English, and Japanese.

Speech Synthesis Supports Multiple Languages

Cosyvoice 300M SFT

CosyVoice is a text-to-speech (TTS) model that supports multilingual and multi-style voice synthesis.

Speech Synthesis

Voicecraft 330M TTSEnhanced

VoiceCraft is a PyTorch-based text-to-speech model supporting high-quality speech synthesis.

Speech Synthesis

Voicecraft 830M TTSEnhanced

VoiceCraft is a PyTorch-based text-to-speech model that supports high-quality speech synthesis.

Speech Synthesis

Voicecraft Giga330m

VoiceCraft is a PyTorch-based text-to-speech model that supports high-quality speech synthesis.

Speech Synthesis

Metavoice 1B V0.1

MetaVoice-1B is a 1.2 billion parameter text-to-speech (TTS) foundation model trained on 100,000 hours of speech data, specializing in generating emotional English speech with support for voice cloning and long-form synthesis.

Speech Synthesis English

Kinyarwanda YourTTS

An end-to-end deep learning-based Kinyarwanda TTS system supporting zero-shot learning, requiring only 1 minute of speech to introduce a new voice.

Speech Synthesis

Transformers Other

DigitalUmuganda

Kinyarwanda YourTTS V1

This is a deep learning-based end-to-end Rwandan text-to-speech (TTS) system with zero-shot learning capability, requiring only 1 minute of speech to introduce a new voice.

Speech Synthesis

Transformers Other

DigitalUmuganda

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase